-
Notifications
You must be signed in to change notification settings - Fork 5
some initial tweaks of the input format #132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but I'd let @abirchfield decide whether to merge this.
As an additional comment, it would be good to add another appendix with JSON style guidelines for input files.
One of the goals of the format is that it will be non-repetitive to keep file size concise and enhance human readability. Notice that, with the proposed changes, the demo file has ballooned to about double its prior size. Imagine if we have 2000 buses, and each of them has the identical literal strings "id", "class", "name", "default_voltages", "voltage_base" rather than just an ordered list with known required items. Same thing if we have 300 GENROUs. We will end up with massive files with lots of repetitive text that are neither efficient for humans nor computers to read. It is a stylistic choice but I prefer the original. |
json parsers these days are quite fast, i don't think there would be any significant performance overhead to this. the main reason i think objects over arrays for device & node descriptions is the way to go is that they're more self-describing compared to the heterogeneous arrays used previously. additionally, another change i made was pulling out the optional fields we had already into the object we now use for describing these. this removes our specified optional fields from the same namespace that other consumers of this format would use to implement their own extensions, freeing us from ensuring new optional fields we add don't have names that conflict with those used by others. this is also beneficial, and not something that could've been done with the previous approach unless we had two separate objects in the array: one for us and the other for extensions. finally, the use of an object over a heterogeneous array is just more in line with people's expectations, but this may not be that convincing of an argument. |
Co-authored-by: pelesh <peless@ornl.gov>
I committed some changes. I think I am persuaded to use more of an object-oriented format rather than tables. In going through it, I realized some of the names were not consistent with nomenclature decisions we made since the original draft of the format. @superwhiskers feel free to take another pass through. |
so there's one major comment here. in the initial pull request, i removed the
i notice you've added it back. was my assumption incorrect? |
i'd also like to start some discussion of the output format. aside from csv, some other alternatives that would work are json lines or perhaps the same thing with messagepack. this may not be the best choice if streaming output is not a concern we have (the primary benefit of such formats would be the ability to easily stream output from one area to another) but they also provide the benefits of json and messagepack, which is that they are better specified than something like csv, but issues with that may not come up for our use case. |
It is typical and convenient to have a separate field in a case file for the case date. |
What this would look like in JSON lines? Keep in mind we may have 100k or more time points for some studies. I would not recommend repeating all the output channel names for each time point. I guess the simplest would be just to add a [ and ] to the beginning and end of each of the lines in my example, so that every line is an array representing a time point, except the first line is an array with channel labels. |
Of course the first thing I would do for my own workflow is write a script to convert it to CSV so I can open it in Excel... |
ah. if your main use case is opening it in excel then sticking with csv would probably be the better choice. json lines or something like that is probably excessive and wouldn't be of any benefit to you. the only reason i bring it up is that i've used it in the past for some simulation work and retaining the output channel names makes every single pipeline slightly more self-describing (the size increase is negligible if the json is compressed). additionally, there are many tools for working with json. |
i've explicitly specified a date format (YYYY-MM-DD) so that there's no confusion about the date being represented by a case with this field and marked the field as optional as it wasn't used in the example and it may not always be applicable. i've also removed the "uncommon" label applied to some fields as it wasn't clear to me what that was there to indicate. suggesting that such a field overrides the default value seemed enough to me. the remaining things i'd like to complete before this is ready to merge / ready for final review are:
|
Generators commonly specify data on their own power base, overriding the system default. Very rarely would a device override the frequency base, but it could. |
User should be able to specify time as well as date. Generally this could be written automatically by a program writing the file. |
would it not be more appropriate to call it |
The output CSV format is just an example. If you want to add any specifics go ahead. |
For default values of each of the bus classes, it will be 1.0 for Vr and Va, and it will be 0.0 for all other variables. This is probably easier said in a note below the table rather than cluttering the table. |
my only confusion regarding the existing output format is the meaning of the "solver status" column. it's not clear to me how portable such a field would be. it may be better to just leave that for another output channel (that is, not bundling it into the same file containing samples of all of the monitor channels). |
I would prefer all the output be together. Maybe the solution is to add an optional "mon" field to the header section, for system-level outputs. This could include solver status, but could also include various aggregate measurements. Time, of course, would still always be the first channel. |
how do you feel about a json lines derived output format like this? each timestep would be like this: {"time": 0, "solver": { /* solver-dependent object */ }, "buses": [/* bus objects */], "devices": [/*device objects*/]} the solver object contains status arbitrary status information dependent upon the solver used (as this kind of state can vary between solvers). the bus objects are as follows: {"number": 0, "monitored": {"variable": /* state */}} the device objects are as follows: {"ports": /* same as how they are specified */, "id": /* some value */, "monitored": {/* same as above*/}} this is a quick sketch. i decided to move back to json lines as i couldn't think of a single good way to render this information down to a csv without recreating all of the niceties of a format like json. suggest any changes you think would improve it. |
The basic idea for the output is: you define what variables you want to monitor and the program exports a matrix of those variables across time. CSV is most useful for that, so I at least want a CSV output option. We can add additional, more complicated output formats in the future if they will be useful for anyone. |
absolutely. my main thought here was that it didn't seem like i could make a single csv-derived format that would fit most use cases so i propose that json lines format and a tool (a python script or something) that converts it to a csv the way you'd want it as proposing a csv like what you were thinking of seems too specific to make sense proposing as a "standard" output format (i'm mainly thinking of the solver data here). alternatively, we just cut the output format from this specification and develop one outside of it before adding one to this spec. |
we may need to make a separate pull request then to make a tweak or two to the output format as i don't know how well it'd work as-is (primarily talking about the solver status column still). |
description
this pull request proposes some tweaks to the input format itself, and to the language of the specification
proposed changes
here's a quick summary of the changes made:
further comments
case_
prefix fromcase_name
,case_description
,case_comments
in theheader
section?case_date
field from theheader
as it seems to me that this role could be fulfilled by eithercase_description
orcase_comments
, but was this assumption incorrect?